Posts with tag Digital Humanities

Apr 19 2022

Sharing texts better, part 1: Austrian Newspapers

It’s not very hard to get individual texts in digital form. But working with grad students in the humanities looking for large sets of texts to do analysis across, I find that larger corpora are so hodgepodge as to be almost completely unusable. For humanists and ordinary people to work with large textual collections, they need to be distributed in ways that are actually accessible, not just open access.

Mar 28 2022

Day of DH Liveblog, 2022

I’ve never done the “Day of DH” tradition where people explain what, exactly, it means to have a job in digital humanities. But today looks to be a pretty DH-full day, so I think, in these last days of Twitter, I’ll give it a shot. (thread)

We’ll start it at the beginning–1:30 or so AM, finally sent out an e-mail I’d been procrastinating on to the college grants administrator for a public humanities project about immigrant histories I’m running with @ellennoonan and Sibylle Fischer.

We’ve had NYU funding as a Bennett-Polonksy Humanities Lab (https://nyuhumanities.org/program/asylum-h-lab-2020-2021/) to this point, but presenting to the history department last month clarified the use in making one of our primary sorts of records–A files–more accessible to historians and family researchers.

But that will take some real institutional support, because the stuff we’ve obtained–legally!–from US customs and immigration in our trial run is so shockingly personal in a lot of cases that I can’t really share it yet.

(“Yet” is the wrong word–can’t ethically share in my lifetime, probably. But there are still really important reasons to work on auditing these records especially. If you’re a naturalized citizen or permanent resident and want any help getting your own A-file, let me know!)

OK, skipping to about 9:50 AM. (Late start b/c the first-grader had a school event and my wife teaches Thursday AM). Today’s first teaching, for my class https://benschmidt.org/WWD22 will be focused on 19C directories from the NYPL.

Nick Wolf and @bertspaan digitized these years ago, but there’s more to do with them. A couple weeks ago @SWrightKennedy shared a preview of Columbia’s great new geolocation data about 19C New York… https://mappinghny.com/about/

And yesterday I finally pushed a full pipeline bringing the last two weeks of student work together for doing geo-matching and cleaning of these to the github repo. https://github.com/HumanitiesDataAnalysis/Directories . This should allow some amazing analysis of economic geography, name types, etc.

So now we’ve got 8.3m individual people for every year from 1850-1889 queued up and ready for a variety of analyses. I want to send the students a map to show how all their R code is paying off, but the deepscatter module is breaking–only one of the filters is working here.

I spend 40 minutes poking in the web code there to try to refactor the code to get the interface working right, but this isn’t really relevant for the class right now–more something for the summer, I guess. So I give up and decide to do this DH tweeting instead.

Because of the whole “Twitter is almost over” thing, but some lingering guilt about not blogging enough, I decide that a “Day of DH” post should really be a blog first–so let’s finally structure some markdown for a twitter thread that can go on benschmidt.org.

It takes a surprising amount of mucking around with the svelte-kit settings to get things publishing correctly, and I have to remember my own markdown naming conventions. But after a few minutes, we’ve got full recursion. https://benschmidt.org/post/2022-03-28-day-of-dh/day-of-dh-22/

Whoops, or not… Time to muck with svelte-kit a little more…

Well, this is embarassing but typical. Turns out there was a bug in the bleeding-edge svelte-kit build that broke trailing slash behavior in URLs. Because ‘ https://benschmidt.org/post/2022-03-19-better-texts/ ’ is different from ‘ https://benschmidt.org/post/2022-03-19-better-texts. ’ Finally fixed.

Insane levels of debugging is a real pain and occupational hazard. But to be honest, I don’t know how anyone could responsibly teach this stuff without doing this sort of rebuilding and rescaling all the time. Every one of those things is kind of interesting and builds up ability to fix others’ code…

Insane levels of debugging is a real pain and occupational hazard. But I don’t know how you can responsibly teach this stuff without these frequent rabbit holes. Every one of those things is kind of interesting and builds up ability to fix others’ code…

Jun 07 2021

Genre, Manifolds, and AI.

This article in the New Yorker about the end of genre prompts me to share a theory I’ve had for a year or so that models at Spotify, Netflix, etc, are most likely not just removing artificial silos that old media companies imposed on us, but actively destroying genre without much pushback. I’m curious what you think.

Dec 05 2019

Two Volumes: the lessons of Time on the Cross

(This is a talk from a January 2019 panel at the annual meeting of the American Historical Association. You probably need to know, to read it, that the MLA conference was simultaneously taking place about 20 blocks north.)

Mar 19 2019

A computational critique of a computational critique of computational critique.

Critical Inquiry has posted an article by Nan Da offering a critique of some subset of digital humanities that she calls “Computational Literary Studies,” or CLS. The premise of the article is to demonstrate the poverty of the field by showing that the new structure of CLS is easily dismantled by the master’s own tools. It appears to have succeeded enough at gaining attention that it clearly does some kind of work far outsize to the merits of the article itself.

Jun 12 2015

Buying a computer for digital humanities work

I’ve gotten a couple e-mails this week from people asking advice about what sort of computers they should buy for digital humanities research. That makes me think there aren’t enough resources online for this, so I’m posting my general advice here. (For some solid other perspectives, see here). For keyword optimization I’m calling this post “digital humanities.” But, obviously, I really mean the subset that is humanities computing, what I tend to call humanities data analysis. [Edit: To be clear, ] Moreover, the guidelines here are specifically tailored for text analysis; if you are working with images, you’ll have somewhat different needs (in particular, you may need a better graphics card). If you do GIS, god help you. I don’t do any serious social network analysis, but I think the guidelines below should work relatively with Gephi.

Apr 03 2015

Commodius vici of recirculation: the real problem with Syuzhet

Practically everyone in Digital Humanities has been posting increasingly epistemological reflections on Matt Jockers’ Syuzhet package since Annie Swafford posted a set of critiques of its assumptions. I’ve been drafting and redrafting one myself. One of the major reasons I haven’t is that the obligatory list of links keeps growing. Suffice it to say that this here is not a broad methodological disputation, but rather a single idea crystallized after reading Scott Enderle on “sine waves of sentiment.” I’ll say what this all means for the epistemology of the Digital Humanities in a different post, to the extent that that’s helpful.